Species Distribution Modeling of Citizen Science Data as a Classification Problem with Class-Conditional Noise
نویسندگان
چکیده
Species distribution models relate the geographic occurrence pattern of a species to environmental features and are used for a variety of scientific and management purposes. One source of data for building species distribution models is citizen science, in which volunteers report locations where they observed (or did not observe) sets of species. Since volunteers have variable levels of expertise, citizen science data may contain both false positives and false negatives in the location labels (present vs. absent) they provide, but many common modeling approaches for this task do not address these sources of noise explicitly. In this paper, we propose to formulate the species distribution modeling task as a classification problem with class-conditional noise. Our approach builds on other applications of class-conditional noise models to crowdsourced data, but we focus on leveraging features of the noise processes that are distinct from the class features. We describe the conditions under which the parameters of our proposed model are identifiable and apply it to simulated data and data from the eBird citizen science project.
منابع مشابه
Machine Learning and Citizen Science: Opportunities and Challenges of Human-Computer Interaction
Background and Aim: In processing large data, scientists have to perform the tedious task of analyzing hefty bulk of data. Machine learning techniques are a potential solution to this problem. In citizen science, human and artificial intelligence may be unified to facilitate this effort. Considering the ambiguities in machine performance and management of user-generated data, this paper aims to...
متن کاملRobust Portfolio Optimization with risk measure CVAR under MGH distribution in DEA models
Financial returns exhibit stylized facts such as leptokurtosis, skewness and heavy-tailness. Regarding this behavior, in this paper, we apply multivariate generalized hyperbolic (mGH) distribution for portfolio modeling and performance evaluation, using conditional value at risk (CVaR) as a risk measure and allocating best weights for portfolio selection. Moreover, a robust portfolio optimizati...
متن کاملروشی جدید برای عضویتدهی به دادهها و شناسایی نوفه و دادههای پرت با استفاده از ماشین بردار پشتیبان فازی
Support Vector Machine (SVM) is one of the important classification techniques, has been recently attracted by many of the researchers. However, there are some limitations for this approach. Determining the hyperplane that distinguishes classes with the maximum margin and calculating the position of each point (train data) in SVM linear classifier can be interpreted as computing a data membersh...
متن کاملClassification with Asymmetric Label Noise: Consistency and Maximal Denoising
In many real-world classification problems, the labels of training examples are randomly corrupted. Most previous theoretical work on classification with label noise assumes that the two classes are separable, that the label noise is independent of the true class label, or that the noise proportions for each class are known. In this work, we give conditions that are necessary and sufficient for...
متن کاملAn application of Measurement error evaluation using latent class analysis
Latent class analysis (LCA) is a method of evaluating non sampling errors, especially measurement error in categorical data. Biemer (2011) introduced four latent class modeling approaches: probability model parameterization, log linear model, modified path model, and graphical model using path diagrams. These models are interchangeable. Latent class probability models express l...
متن کامل